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Abstract. A maximum likelihood (ML) technique for detecting compact sources in images of the x-ray sky is examined. Such 
images, in the relatively low exposure regime accessible to present x-ray observatories, exhibit Poissonian noise at background 
flux levels. A variety of source detection methods are compared via Monte Carlo, and the ML detection method is shown to 
compare favourably with the optimized-linear-filter (OLF) method when applied to a single image. Where detection proceeds 
in parallel on several images made in different energy bands, the ML method is shown to have some practical advantages 
which make it superior to the OLF method. Some criticisms of ML are discussed. Finally, a practical method of estimating the 
sensitivity of ML detection is presented, and is shown to be also applicable to sliding-box source detection. 
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1. Introduction 

X-ray observatories from HEAO-2/Einstein onward have car- 
ried imaging cameras which have counted individual x-ray 
photons as they impinged upon a rectilinear grid of detector 
elements. The number of x-ray events detected per element 
within a given time interval will be a random integer which 
follows a Poisson probability distribution. 

As the sensitivity of detectors and the effective areas of x- 
ray telescopes have increased, so has the x-ray cosmic back- 
ground been increasingly resolved into compact sources. Eg 
a telescope such as XMM-Newton detects on average 70 
serendipitous sources within the field of view of each pointed 
observation (Watson et al 2008). A full characterisation of the 
x-ray background would shed light upon the evolution of the 
early universe (see Brandt and Hasinger 2005 for a recent re- 
view of deep extragalactic x-ray surveys and their relation to 
cosmology). It is therefore of interest to find the best way to 
detect compact sources in these images - that is, the detec- 
tion method which offers the best discrimination between real 
sources and random fluctuations of the background. 

In a previous paper (Stewart 2006 hereinafter called pa- 
per I) I discussed the general class of linear detection meth- 
ods, ie methods which form a weighted sum of the values of 
several adjacent image pixels, and optionally also over several 
different energy bands. A method for optimizing the weights 
was presented and shown to be substantially superior to the 
sliding-box method which, no doubt because of its simplicity, 
has been frequently used for compiling catalogs of serendipti- 
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tous sources (see eg DePonte & Primini [l993t Dobrzycki et al 
2000; also task documentation for the exsas task detect and the 
SAS task eboxdetect). 

The present paper should be considered as an extension to 
paper I and also partly as an emendation. Several issues dis- 
cussed in paper I either had not at that time been completely 
thought through, or were not described as clearly as they might 
have been. These issues are itemized as follows: 

- The difference between the distribution of samples of a ran- 
dom function, and the distribution of its peak values, was 
not realized. 

- The concept of sensitivity was not adequately defined. 

- The normalization of the signal did not allow direct com- 
parison between single-band and multi-band detection sen- 
sitivities. 

- Both paper I and the present paper are concerned not only 
with numerical values of the sensitivity of various meth- 
ods of source detection, but also with practical, approxi- 
mate methods of calculating these values 'in the field'. One 
may prefer a lengthy, complicated method for calculating 
values for a paper; such is unlikely to be an equally satis- 
factory field method. In paper I there was insufficient grasp 
of this conceptual distinction. 

- Sensitivities for the A^-band eboxdetect method were incor- 
rectly calculated (see section[4j2]for an explanation). 

The different signal normalization adopted in the present 
paper precludes direct comparison of the results herein with 
those of paper I; however after appropriate rescaling the two 
sets of results are consistent. No numerical errors in paper I 
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were found, despite extensive reworking of the software for the 
present paper. 

The primary aim of the present paper is to present a method 
of source detection based on the Cash statistic (Cash 1979) and 
to show that this is at least as sensitive as the optimized linear 
filter described in paper I. For this purpose, sensitivities of a 
variety of detection methods are compared across a range of 
background values for both single-band and multi-band detec- 
tion scenarios. The paper also aims to demonstrate other ad- 
vantages of the Cash source detection, and to discuss and in- 
vestigate some reported disadvantages. 

Section lXTl contains a mathematical description of the gen- 
eral source detection problem. In section [2721 a 3-part detection 
algorithm is described, in which a map of the detection proba- 
bility at each image pixel is made as step one, peaks (maxima) 
in this map are located in step two, and a source profile is fitted 
to each peak in step three. The relation between the respective 
probability distributions of the map vs peak values is also dis- 
cussed. Detection sensitivity is defined in section |2~3"1 

Cash-statistic source detection is described in section l2".4. II 
In section 12.4.21 it is shown how the Cash statistic may be 
used at both stages 1 and 3 of the 3-part source detection al- 
gorithm. Finally, in sections \2. 4. 31 and 12.4.41 objections to the 
Cash method are addressed. 

Section [3] contains the results of the sensitivity compar- 
isons. In sections [3~T1 and [3~2l details of the necessary calcu- 
lations are discussed. 1-band results are presented in section 
I3.3l and iV-band results in section l3~4l The degradation of some 
iV-band sensitivities when the true source spectrum does not 
match that assumed is discussed in section [3~5l 

An approximation useful for 'in the field' estimation of 
Cash sensitivities is presented in section 14.11 Finally, in sec- 
tion 14.21 a deficiency of the sliding-box method is discussed, 
and a method for remedying it is proposed. 

2. Theory 

2.1. Source detection 

In the situation relevant to the present paper we have a model 
function e(r) defined over the space r by 

M 

e(r)=B(r) + Y j a j Sj{r-roj) 

j 

where B is called the background, each a is a scalar ampli- 
tude and S is the signal shape. We conceive of a set of N po- 
sitions r, for i € [1,N], each with an associated measurement 
of flux Cj. Each a is a random integer with a Poisson distribu- 
tion about a mean given by the value <?, of the model at that 
position. From this comes the requirement that <?, > Vz. In 
mathematical shorthand 

M 

(d) = a = Bj + ^ cijS /ri - r 0J ). 

j 

In CCD x-ray detectors the c, are obtained by accumulation 
within arrays of voxels, ie volumes within the physical coordi- 
nates r. In this case the functions B and S are integrated over 



the z'th voxel dimensions to give the respective expectation val- 
ues Bj, S ij etc. For the purposes of the present paper it is mostly 
assumed that the coordinates r consist of just the spatial (x, y) 
coordinates of the x-ray detector; in some sections the energy 
of the x-ray photons is added to the set. In this situation each 
voxel is bounded in the (x, y) directions of course by the edges 
of the physical pixels of the detector. In the energy direction, 
voxels are bounded by the boundaries of any energy selection 
performed by the user. 

The ultimate aim of source detection is to obtain the best 
estimates possible of a and r$ for the most numerous possible 
subset of the M sources present. Satisfactory performance of 
this depends upon a couple of conditions. Firstly, the sources 
must not be confused - ie the density of 'detectable' sources 
should be« 1 /Ar$ where Ar$ is some characteristic size of 
S . For XMM-Newton this requires the detectable source den- 
sity to be less than about 10 5 deg~ 2 . The deepest surveys to date 
reach x-ray source densities of only a tenth of this value (Brandt 
and Hasinger 2005). On the other hand, it was shown in paper 
I that perturbations of the background statistics set in at much 
shallower levels of sensitivity; but still a little beyond the deep- 
est practical reach of XMM-Newton. There appears thus to be 
still some scope to improve source detection in the present ob- 
servatories without coming up against the confusion limit. To 
some extent also one can disentangle confused sources during 
the fitting process, by comparing goodness-of-fit for fitting one 
versus two or more separate source profiles to a single detec- 
tion. 

If we may neglect confusion then we can detect sources one 
at a time. The model formula can thus be simplified to 

ei = Bi + aS(ri-ro). (1) 

The second condition necessary to source detection is that 
the form of S is (i) known (at least approximately) and (ii) not 
too similar to B. In x-ray detectors used to date, a significant 
fraction of sources have angular sizes which are smaller than 
the resolution limit of the telescope. S for such sources is then 
simply the PSF of the telescope, which can be estimated a pri- 
ori, and which is often much more compact, ie with more rapid 
spatial variation, than the background. However, about 8 per- 
cent of sources detected by XMM-Newton for example are re- 
solved or extended (Watson et al 2008). Detection of a resolved 
source is much more difficult (or at least, less efficient) since its 
S is dominated by the angular distribution of source flux, which 
is not known a priori. In addition, there is of course no upper 
limit to the size of x-ray sources, and in fact the distinction be- 
tween source and background is largely a philosophical one. In 
practice one has to impose an essentially arbitrary size cutoff 
and consider only variations in x-ray flux which have spatial 
scales smaller than that cutoff to be sources, and take every- 
thing else to be background. 

Modern CCD detectors can measure not only the positions 
of incident x-ray photons but also their energies. This opens an 
additional degree of freedom for S and B and thus the possibil- 
ity of exploiting differences in their respective spectra to sep- 
arate them with even greater efficiency. The average spectrum 
of the sources in the 1XMM catalog is quite different from the 
instrumental background spectrum of typical instruments (see 
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Carter and Read 120071 and Kuntz and Snowden 2008] for re- 
cent data on XMM-Newton), and appears also to be different 
to the spectrum of the cosmic background (see eg Gilli et al 
2007 ). However if it is true, as most authorities now suppose, 
that practically the whole of the cosmic x-ray background at en- 
ergies higher than a few keV is comprised of point sources, one 
might expect the latter difference to diminish as fainter sources 
become detectable. 

Parallel detection of sources in images in several energy 
bands of the same piece of sky is complicated by the fact that, 
although all (point) sources have the same PSF at a given detec- 
tor position and x-ray energy, there is no corresponding com- 
mon source spectrum. This means that we cannot 'locate' the 
source S in the energy dimension - we must be satisfied with 
locating it spatially on the detector. 

Some of the detection algorithms described in the present 
paper avoid making any assumption about the source spectrum; 
others require the user to choose a spectrum a priori. As will 
be seen from the results, there is usually a trade-off involved, in 
that methods which require the user to select a spectrum tem- 
plate may be better at detecting sources with spectra close to 
that template, but worse where the source spectrum is very dif- 
ferent to the template; whereas a method which makes no as- 
sumptions will usually perform with about the same efficiency 
on all sources. 

2.2. Likelihood-map source detection 

The present paper treats of a generic source-detection proce- 
dure which falls into three parts. The first part calculates a map 
giving an initial, possibly coarse estimate of the probability, for 
each pixel, that a source is centred on that pixel. In other words, 
it tests, for each jth map pixel, the following flux model: 

e i = B i + aS(r i -r j ). (2) 

This task is accomplished by calculating some statistic U from 
the count measurements c, taken from some subset of the N 
detector pixels. The spatial distribution of this subset ought not 
to be much larger than the size of S - otherwise one is includ- 
ing pixels which one knows contain no information about S . In 
other words, the pixels chosen to calculate the statistic at de- 
tector pixel j are best located within a field of approximate size 
Ar$ , centred about pixel j. 

After calculating the statistic U for all pixels, this first part 
of the detection procedure converts this into a measure of the 
probability that the value of U at pixel j would arise by chance 
fluctuation of the background alone. 

The second part of the source-detection procedure consists 
of a peak detection algorithm. The output from this part is a list 
of peaks in the probability map, which are interpreted as source 
candidates. 

The final task cycles over this list of candidates and at- 
tempts to fit the signal profile to the field of c, values surround- 
ing each. To be of maximum use, the fitting procedure should 
be capable both of separating a single candidate into two or 
more confused sources, and merging multiple candidates which 
turn out to belong to a single source. 



In the XMM-Newton data-product pipeline, both stages 
1 and 2 are performed by the SAS task called eboxdetect, 
whereas the final stage is performed by emldetect. 

The final task may perform some additional calculations. It 
may calculate x 2 for the fit so as to aid in discriminating real 
cosmic sources from other phenomena, such as 'hot' CCD pix- 
els, which may give rise to excess counts (this is not at present 
done by emldetect). As is shown in section l2~4l use of the Cash 
statistic allows one to calculate a final value for the source- 
detection likelihood after the fit. 

This separation of detection and fitting stages is a useful 
procedure only where the image pixels are not significantly 
larger than the characteristic width Ar$ of S . This criterion is 
in fact achievable with the imaging cameras of all three current 
observatories primarily dedicated to detecting x-rays, namely 
XMM-Newton, Suzaku and Chandra (Struder et al 120011 and 
Turner et al I200T1 for XMM-Newton; Serlemitsos et al [20071 
and Koyama et al 2007 for Suzaku; and Weisskopf et al 2002 
for Chandra). 



2.2.1. Peak detection 

The peak-detection stage does not seem to have been the sub- 
ject of much study. It is problematical for two reasons. Firstly, 
the frequency distribution of values at the maxima of a ran- 
dom function (call it the 'peak' distribution) is not in general 
the same as the distribution of values over a set of randomly 
chosen positions (which we might call the 'whole-map' distri- 
bution). In general one would expect the former distribution to 
be somewhat broader than the latter, since the probability that 
a random pixel is a peak is small for relatively low values but 
large for high values. Nakamoto et al (2000) give a method 
to calculate the distribution of peaks for a random function of 
one dimension, but it is not clear if this theory can be extended 
to more than one dimension. However, as is shown in section 
I2.4.41 the Cash maximum likelihood statistic does provide a 
way to estimate the shape of the peak distribution, although not 
its normalization (which depends on the fraction of map pix- 
els which are peaks). The normalization must be estimated by 
appropriate simulations. 

The second problem is that the definition of a peak in an 
array of samples of a nominally continuous function is some- 
what arbitrary. One obvious choice of definition is to label any 
j'th pixel as a peak pixel if the value cj at that pixel is greater 
than the values at all other pixels within a patch of pixels sur- 
rounding the jth. But what size and shape to make the patch? 
It seems clear that these ought to match the size and shape of 
S - too large a patch might miss additional, nearby sources; 
whereas too small runs into the opposite danger of listing more 
than one peak for a single source. Also, it is not clear how this 
choice might affect the frequency distribution of peak values. 
This danger was pointed out by Mattox et al 0996) in respect 
of source detection in EGRET data. 
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2.3. Sensitivity 

The efficiency of a method of detecting sources is characterised 
by its sensitivity, which, broadly speaking, is the smallest am- 
plitude a which will result in a source signal aS being detected 
among background of a given level. While this sentence con- 
veys the general idea, a precise mathematical definition is nec- 
essary before a comparison can be done between methods of 
source detection. A somewhat stilted and obscure definition of 
sensitivity was presented in paper I. The present section is an 
effort to make this definition clearer and more rigorous. 

2.3.1. Definition 

Let us start by assuming that B and S are known a priori. 
Formally speaking this is not true: one ought to make use only 
of their best estimates B and S in the formula for any source de- 
tection statistic. In practice however, because B and S can often 
be quite well estimated by methods separate from the source 
detection itself, it is convenient for the sake of simplicity to 
make the approximation B ~ B etc. 

Each of the detection methods discussed in either paper I or 
in the present paper can be associated with a statistic U which is 
to be evaluated at positions of interest: perhaps at the centre of 
each image pixel, or perhaps at a set of positions of candidate 
sources. U at any position is evaluated, using a formula par- 
ticular to the detection method, from the measured counts c,. 
Clearly U so calculated is a random variable which will occur 
with some well-defined probability distribution p(U). We can 
also define an integrated (more strictly, a reverse-cumulative) 
probability P such that 

P(>U) = I dU p(U). 
Ju 

In order to understand how sensitivity relates to U it is help- 
ful to imagine an ensemble of measurements of a source of a 
given amplitude a. Or one could simulate this situation via a 
Monte Carlo experiment which, for each member of its ensem- 
ble, generates a set of c, values. The input model for this should 
consist of equation [2] and can make use of the a priori known 
shapes of B and S . For each member of the ensemble, U is cal- 
culated from the c, values. The resulting ensemble of U values 
will have a distribution which will in general be different for 
different a priori-assumed values of a. To reflect this, let us 
write both the probability density p and the reverse-cumulative 
probability P(>U) with an additional a argument as p(U;a) 
md P(>U;a). 

We are interested first of all in the predicted P in the case 
that a equals zero - ie, when there is no source in the detection 
field, only background. It is necessary to consider this scenario 
because the random nature of the data means that, at least some 
of the time, we will unavoidably be labelling as a source a con- 
centration of counts which is merely a random fluctuation of 
the background. 

The first step in source detection is to decide what fraction 
of these false positives we are prepared to put up with. This 
decision defines a cutoff probability Pdet which in turn is asso- 



ciated (through the function P(>U;0)) with a definite value of 

As an aside, note that in all cases in which the measured 
data are random integers, p(U; a) is a sum of delta functions, 
thus P(> U; 0) is a descending step function. Inversion of the 
latter to obtain t/det can clearly only give an approximation in 
which C/det is set to the next highest value of U for which p is 
defined. In this situation one may expect the sensitivity of the 
method to descend in steps with decreasing background. This 
effect can be seen in the plots of the sensitivity of the simple 
sliding-box detection scheme in figures [5] [7] and [8] 

The sensitivity Cdet of a detection method is here defined to 
be that value of a which, for given B and S , gives rise to values 
of U such that (U) — t/det, where t/det is obtained from fdet as 
described above. See figure Q] for a diagrammatic example of 
this. The algorithm for calculating sensitivity is thus as follows: 

1 . Decide a value for fdet- 

2. Invert the relation P = P(>U;0) (solid line in figure Q} to 
obtain Ud et from Pdet)- 

3. An expression for the average value of U as a function of 
alpha is inverted to give a^et =< U > 1 (t/det)- 

Note that other definitions are possible, for example one 
could define such that the median of U{a,\ et ) instead of 
its mean was adjusted so as to equal C/det- I have preferred the 
definition in terms of the mean simply because it is generally 
easier to calculate. However in the limit of high counts, all the 
varieties of U here studied tend to a gaussian distribution, for 
which the mean and median are equal. Within the range of 
background values treated in the two papers, the median and 
mean definitions empirically are found to produce similar val- 
ues. 

The mean definition is equivalent to the 'counts amplitude' 
concept employed in paper I; however the median definition is 
implicit in figure 12 of that paper. 

The curves in figure Q] were constructed via Monte Carlo. 
At each iteration, simulated data values were generated for a 
small (9 x 9) patch of pixels; the value of the appropriate de- 
tection statistic U was then calculated for the patch. The end 
result of the Monte Carlo is an ensemble of U values. The 
data values were Poisson-distributed integers whose mean val- 
ues were given by a model comprising flat background of 1 .0 
counts/pixel plus a centred source profile. The solid curve and 
the error bars represent Monte Carlo-generated ensembles with 
10 6 and 10 5 members respectively. 

As mentioned in section lZSl a multi-band detection method 
which does not require the user to make an a priori choice of 
source spectrum is expected to perform better at 'across the 
board' detection, because of the fact that cosmic x-ray sources 
don't adhere to a common spectral template. Note however 
that, regardless of the requirements of a (multi-band) detection 
method, calculation of its sensitivity always requires the user 
to choose a spectral template. Eg in the Monte Carlo thought 
experiment described above, if a is non-zero, S needs to be 
completely defined over the whole space r, which in the multi- 
band case includes an energy dimension, in order to generate c,- 
values and ultimately a value for (U). 
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Fig. 1. The solid curve and the error bars (the latter shown in 
red in the electronic version) represent the results of separate 
Monte Carlo experiments, as described in the text. The solid 
curve shows the reverse-cumulative probability P{>U; a) in the 
case that the model consists entirely of background (ie, a = 0). 
The horizontal dotted line shows the chosen detection proba- 
bility fdet- Its interception with the solid curve gives the cutoff 
value t/jet °f the detection statistic U. The error bars show the 
approximate course of the probability density function p{U) 
for a case in which the model contains a non-zero source com- 
ponent. In the present case the model source amplitude a has 
been adjusted such that (t/(adet)) = t^det- This is the required 
condition for a to equal the sensitivity adet- 

2.3.2. Map vs peak sensitivity 

In paper I it was implicitly assumed that if one made a fre- 
quency histogram p ?ea k(U) of U values at an ensemble of lo- 
cations of detected sources, that this would be related to the 
probability distribution p msLp (U) of U values at all image pixels 
by the following proportional relation: 

/TT\ ^image 
^beam 

where Aj mage is the image area in pixels, and Abeam represents a 
constant known as the 'beam area', taken to be approximately 
equal to the FWHM area in pixels of the PSF. If this were true, 
the cutoff probability Pdet could be obtained by multiplying the 
maximum acceptable number of false detections per unit image 
area by the beam area. I now realize however that this assump- 
tion is not true: as described in section I?. 2. II and demonstrated 
in figure [3] the shapes of the two distributions are different. It 
follows then that one ought to use the probability distribution 
of U values at peaks in the map, rather than at all map pixels, 
in calculating sensitivity. 

This is a pity, because there are several attractive features 
of calculating 'map' rather than 'peak' sensitivities. Three of 
these have present relevance. The first and most obvious of 
these 'attractions' is the retention of consistency with paper I. 

Secondly, it is usually much easier and more straightfor- 
ward to calculate 'map' sensitivities. An adequate closed-form 



approximation for the 'map' version of the cumulative proba- 
bility function P exists for at least some of the statistics con- 
sidered herein, whereas the 'peak' versions of P must, for all 
but the Cash statistic, be estimated via a lengthy Monte Carlo. 
In addition, calculation of 'peak' distributions naturally also 
requires the addition of a peak detector to the algorithm; the 
Monte Carlo must create a large image rather than a small patch 
of pixels; boundaries of same have to be avoided, and so forth. 

The third thing which makes map rather than peak calcula- 
tions attractive arises because the true number density of false 
detections (indeed of detections of any kind) depends not just 
on the formula for U employed, but also on the nature of the 
peak-detection algorithm. Having to consider the effect of dif- 
ferent algorithms within a 'space' with two degrees of freedom 
rather than one is an unwelcome complication. 

None of these nice features of the 'map' calculation would 
justify its use, however, if it were not for the fact, as pointed 
out by Nakamoto et al (2000), that the 'map' and 'peak' dis- 
tributions of a random function become asymptotically similar 
towards high values of the function. This convergence is ob- 
served for example in figure [3] Since source detection by its 
nature deals with extreme values of the null-hypothesis proba- 
bility function, it was felt to be acceptable to retain the 'map' 
calculation for section [3] in which sensitivities of the various 
methods are compared. However when it comes to calculation 
of the absolute sensitivity of the Cash (or any other) statistic, 
of course the true 'peak' version of P must be employed. 

2.4. The Cash statistic 
2.4.1. Definition 

Cash d 1 979b discussed a ^ 2 -like statistic for Poisson data. 
Cash's statistic was constructed in two stages. The first stage 
comprises the log-likelihood expression 

N 
i=l 

where <?,■ is a model function for the expectation value (c,). L 
can be seen to consist of the logarithm of the product of the 
individual Poisson probability densities /?(c, | e,). If the model e 
can be expressed in terms of one or more adjustable parameters, 
we can define the best fit values of these parameters as those 
which produce the largest value of L. Note that the factorial 
term is not a function of e and thus can be neglected for fitting 
purposes. 

The second stage of Cash's development was to form the 
difference 

f/cash = 2(max[L ? ] - L nuii ) (3) 

where L nu u is L calculated by setting all parameters of the 
model e to fixed, background values and max(L 9 ) for integer 
q is shorthand for the maximum result obtainable by varying a 
pre-selected number q of parameters of e. 

So long as the null hypothesis model can be expressed by a 
combination of legal values of the q free parameters, t/cash > 0. 
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According to Cash's theory, under this and other conditions dis- 
cussed below, t/cash in the asymptotic case is distributed ap- 
proximately as^, ie^ 2 for q degrees of freedom. 

Just as with x 1 f° r gaussian data, the Cash statis- 
tic can be used not only to fit parameter values of the 
model but also to obtain information about the probabil- 
ity distribution of those values. Cash for example uses it 
to derive confidence intervals for fitted parameters. The 
XMM-Newton SAS task emldetect (online description at 
http://xmm.vilspa.esa.es/sas/current/doc/emldetect.ps.gz) em- 
ploys the Cash statistic firstly to obtain best-fit values for the 
position, amplitude and optionally extent of sources in x-ray 
exposures, and secondly, at the conclusion of the fitting routine, 
to calculate a detection probability for each so-fitted source. 
Calculation of the detection probability is done by setting the 
source amplitude to zero in the expression for L nu \\. The proba- 
bility of obtaining the resulting value of t/cash is then equivalent 
to the probability of obtaining the fitted values of the source pa- 
rameters when there is nothing in the field but background. 

Descriptions of the use of the Cash statistic for source de- 
tection can be found in Boese and Doebereiner (2001) and 
Mattox et al ( 1996 ) to give just two examples. The expressions 
developed in the present paper are not based on either treatment 
but comparison reveals close similarities. 

2.4.2. Map vs peak Cash 

In the present paper, the Cash statistic is evaluated under two 
different scenarios: either at the centre of each image pixel to 
make a map of t/cash samples, or at the best-fit locations of de- 
tected sources. In the first case, the source is taken to be centred 
on the pixel in question (equation |2](: the amplitude is the only 
fitted (ie free) parameter. In the second case, the x and y coor- 
dinates of the source centre are fitted as well as the amplitude. 
It is as well to consider how the U values returned by these 
respective situations are related. 

In either case, practical considerations limit the number of 
image pixels included in the calculation of each value of i/cash 
to a relatively small patch surrounding the pixel of interest. As 
can be seen from figure [6] however, the size of this patch has 
an asymptotically decreasing effect on the result of the calcu- 
lation as this size becomes much larger than the characteris- 
tic size of the PSF. Consider then an ideal, continuous, two- 
dimensional function C/ideai obtained by calculating <7cash, with 
only the model amplitude free, at all points within the image 
boundary. Samples of this smooth function on the grid of points 
at the centres of the image pixels represent the asymptotic val- 
ues of the actual, practical map-<7 evaluation in the limit of 
large 'patch' size. The values at peaks of [/ideal bear the same 
relationship to the actually calculated values of peak- U at the 
end of the final fitting procedure. The addition of two more fit- 
ted parameters does not change the value of U at these peaks: 
only the appropriate probability distribution. 

These ideas can be expressed more compactly if we intro- 
duce a little notation. Let t/(A map ) (J - be the 'map' value of U 
evaluated at the pixel centred on (Xi,yi), for a patch area A map ; 
let t/(xo,yo, Apeak) be the value at the fitted source location 



(xo,yo), for patch area Ap^; and let Ua es \_{x, y) be the ideal, 
smooth function. Then 

Uifed{xj,yi) = lim t/(A map );j 
and 

f/ideaK-^Cyo) = lim f/(X0,y0,^ P eak)- 

Apeak ->°° 

Imagine the 3 -part detection scenario of section l2~2l with 
the Cash prescription used both to calculate the map of U val- 
ues in step 1 and to fit the source parameters in step 3. As 
mentioned above, an advantage of the Cash statistic is that it 
can then be used to calculate a final value of U (thus of the 
null-hypothesis probability) for each fitted source. If the im- 
age pixels are smaller than the characteristic size of the PSF 
(mentioned in section l2~2l as a condition for the usefulness of 
the 3-part algorithm), and if we assume (as seems reasonable) 
that the maxima in J/ideai are asymptotically paraboloid, then 
we expect two things: firstly, that the positions output by step 
2, ie positions of maxima in the array of 'map' values produced 
by step 1, will be, most of the time, within 0.5 pixels of the fi- 
nal fitted positions; secondly, and because of this, that the final 
U values will be approximately the same as, but slightly larger 
than, the U values at the step 2 grid positions. These results are 
borne out by Monte Carlo experiments. 

The second expectation, plus the fact that, as mentioned in 
section 1231 the respective tails of the 'map' and 'peak' distri- 
butions of U are expected to be similar, together mean that we 
may investigate the sensitivity of Cash statistic source detec- 
tion, or at least compare it with other methods, via the (much 
more convenient) 'map' process alone. 

Since this paper shows that the Cash statistic is at least as 
sensitive as, and more flexible than, an optimized linear filter, 
it might be supposed that a scheme of source detection which 
employed Cash for stage 1 as well as stage 3 would be op- 
timum in practice. However, although it has been found con- 
venient to make use of this scheme as described above in the 
present paper, it can be shown that this is probably not the best 
scheme for practical source detection. In fact it isn't necessary 
to use the Cash statistic for step 1 as well as step 3; nor is this 
likely to be an optimum use of computer resources. An alterna- 
tive method is well exemplified by the current XMM-Newton 
detection scheme, in which the detection statistic of step 1 is 
calculated using a sliding-box formula (see equation lB.il ). The 
sliding-box statistic is much quicker to calculate than the Cash 
statistic, and its relatively poor sensitivity is offset by setting 
the pass threshhold low enough to pass any possible source 
through to step 3. Since step 3 does employ the Cash statis- 
tic, the full sensitivity of the latter is retained without a great 
penalty in computing time. 

For purposes of the present paper, where the amplitude 
alone is fitted, this was done via a Newton-Rhapson method, as 
detailed in appendix [A] Where more than one parameter was 
to be fitted, the Cash statistic was maximized via a Levenberg- 
Marquardt method (following the algorithm described by Press 
et al l2003l) . 
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2.4.3. Objections to ML measures of significance 

Various authors (eg Freeman et al l 19991 Protassov et al 2002 
Bergmann and Riisager l20"02l Pilla et al 120051 and Lyons 120071 1 
have criticized the use of Maximum Likelihood statistics for 
the detection of sources or spectral lines in Poisson-noise data. 
There seem to be two main concerns: firstly that the parameter 
values which return the null hypothesis lie on the boundary of 
the space of permissable values; secondly that the data to which 
a line (or source) profile is fitted are also those used to estimate 
the background (null hypothesis) signal. 

The first objection appears to be a result of confusing 
physical with statistical constraints. Statistically speaking, for 
Poisson data, there is nothing wrong with a fit which returns a 
negative value for the amplitude, provided the net model flux 
B + aS remains non-negative. Even if one specifies that the 
physical model at issue does not allow negative amplitudes (al- 
though it is easy to conceive of situations which could give rise 
to a dip in flux at a particular position or energy), the fitted am- 
plitude a and the model amplitude a are not the same thing: 
the former is only an estimator for the latter, thus is a random 
variable with a certain frequency distribution. Where the model 
amplitude is small compared to the scatter in the fitted ampli- 
tude, it is necessary to allow negative values in order for the fit- 
ted amplitude to be an accurate estimator for the model. Figure 
^demonstrates this. (Figure|2]was constructed via Monte Carlo 
in a way similar to figureQ] The ensemble comprised 10 5 mem- 
bers.) 

A negative value for a which is small compared to its un- 
certainty cr a , where the physical model specifies that a > 0, 
merely indicates that the measurement is consistent with zero. 
In this case one could use the sensitivity of the statistic as an 
upper limit. If a is negative but large compared to cr a , the con- 
clusion is rather that there is probably something wrong with 
the physical model (although, of course, such values will oc- 
cur with a certain frequency by chance). In neither case are 
negative values of a somehow incorrect. The only requirement 
which the statistical theory imposes in cases where the noise 
is Poissonian is that the total flux (background plus source) 
may not be negative. Thus one can conclude, in contradiction 
to Protassov et al (2002), that the null value of a, ie zero, does 
not lie on the boundary of the permitted values of a. 

Precisely what one does with negative a values depends on 
one's aim. If that aim is to obtain the best estimate of a, as 
in the above paragraphs, then negative as should be retained. 
However if one desires to compile a list of candidate sources, 
then fits giving negative a ought probably to be discarded from 
the list. In no case is it correct to keep the candidate but tamper 
with the a value (by, for example, setting it to zero). 

It should be pointed out that policy toward negative & val- 
ues should differ depending on whether the detection is over a 
single spectral band or over several in parallel. For single-band 
detection it is appropriate to discard candidates for which a is 
negative - indeed, if a different algorithm is used for stage 1 of 
the detection process, as in the XMM-Newton pipeline, such 
candidates may never be submitted to Cash fitting in the first 
place. 
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Fig. 2. The error bars are the result of a Monte Carlo experi- 
ment as described in the text. They show the frequency distri- 
bution of values of the source amplitude a, fitted by use of the 
Cash statistic. All members of the Monte Carlo were generated 
using the same model amplitude (3.0 counts), indicated in this 
figure by the vertical dotted line. Negative values of fitted a 
were allowed; but the fit was not permitted to return a value 
less than the statistical limit (indicated with an arrow and the 
label 'Minimum a), which is dictated by the necessity for the 
total model flux to remain > for all pixels. The pile up of 
these bounded fits can be seen in the high numbers within the 
first bin. It can be clearly seen that it is necessary to retain the 
negative values of fitted a if the mean of the distribution is to 
be a good approximation to the model value of a. 

The situation is slightly more complicated in the multi-band 
case. Because it is impossible to define a single source spec- 
trum, it may be desirable to allow the flux in each band to be a 
free parameter - ie to fit the flux au in each kth band separately. 
Since one can easily conceive of a source which is bright in 
total but very faint in one of more spectral bands, it is entirely 
appropriate to retain negative values for ri^s in the final source 
list. As mentioned above, all that this implies is that the flux of 
that source in that band is undetectable. 

Clearly, a criterion to decide whether a given multi-band, 
free-spectrum detection should be retained or not should in- 
volve some function of the fitted fluxes in all bands. It is just 
not quite clear what this function should be. Simplest may be to 
simply add the fluxes and retain only those sources for which 
the result is positive. 

The validity of the second objection to maximum likeli- 
hood fitting, namely that the background and signal amplitude 
are obtained from the same set of data, depends on the circum- 
stances of the detection. However, in XMM-Newton practice 
at least, the fraction of image area occupied by identifiable 
sources is usually small. In this case it seems a reasonable pro- 
cedure to make an estimate of background from those regions 
of the image which one has concluded probably don't contain 
any detectable sources. The XMM-Newton pipeline performs 
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a simple 2-step iterative procedure to calculate a background 
estimate by this method. Since one is in this case not using the 
same data for both background estimation and source fitting, 
the second objection does not apply. 

Protassov et al propose the replacement of the ML statistic 
in these problems by a Bayesian technique. While admittedly 
more rigorous, their technique would require at least a fresh 
Monte Carlo to be performed for each source. Protassov et al 
in their spectral line examples deal in significances on the or- 
der of a few percent. To calibrate the distribution function of 
their detection statistic, a Monte Carlo of a few hundred ele- 
ments suffices. Source detection however typically demands a 
much more stringent significance level: 3 x 10~ 4 for 1XMM 
(Watson et al 2003) for example. Monte Carlo ensembles with 
on order 10 5 or more members are required to calibrate dis- 
tributions down to this level. Such a technique is not practical 
for the compilation of a source catalog with upward of 50 000 
members. 

2.4.4. Empirical verification of ML source detection 

An empirical look at some Monte Carlo data shows indeed that 
the raw distribution of values of the Cash statistic does not ap- 
pear to conform very closely to the ideal \ 2 curve. This is so 
whether one sets the value of t/cash to zero whenever the fitted 
amplitude is negative (as in the examples of Protassov et al) 
or whether one simply retains such values unaltered. However, 
discarding negative-amplitude fits from the pool of source can- 
didates, as described in the preceding section, brings the distri- 
bution of the remainder very close to the y 2 theoretical curve. 
Figure[3]shows the results of such a Monte Carlo. 

For figures [3] and |4j it was not possible to deal only with a 
small patch of simulated-data pixels, such as for figures Q] and 
12 because in the present case it was necessary to perform all 
three steps of the generalized source-detection algorithm (sec- 
tion l2.2b . For this reason a much larger, circular field was used, 
with a radius (204 pixels) similar to that (~ 210 pixels) of the 
field of view of the XMM-Newton EPIC cameras, when ex- 
pressed in the standard 4-arcsec pixels of the data products. No 
sources at all were used in the input model for the Monte Carlos 
of figures [3] and HI all detections depicted in these figures are 
'false positives'. 

The penalty for improving the shape of the distribution by 
discarding negative fits is that one needs to estimate the ex- 
pected fraction of such fits in order to normalize the distribu- 
tion. When considering the statistics of 'map' -stage Cash (ie 
fitting of amplitude alone at each pixel of an image), this frac- 
tion asymptotes to 0.5 in the limit of high background (see eg 
the zero intercept of the solid curve in figure [3]). This value has 
been found to be an acceptable approximation throughout the 
range of background values used in the present paper. 

Where the source position as well as amplitude is fitted us- 
ing the Cash statistic, this simple approximation for the frac- 
tion of image pixels which yield candidates no longer applies. 
In this case one also needs to know the fraction of pixels which 
are peaks. There appears to be no way to estimate this apart 
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Fig. 3. Reverse-cumulative frequency histograms from Monte 
Carlo ensembles of the Cash statistic C/cash- Continuous curves 
plot the theoretical distributions, which are just the x 2 distribu- 
tions for the respective n degrees of freedom, suitably normal- 
ized to the respective fraction of pixels involved. Crosses, cir- 
cles and diamonds represent Monte Carlo data. The solid curve 
(theory) and crosses (Monte Carlo) show results where only the 
source amplitude is fitted (= 1 degree of freedom), for a source 
centred at each image pixel. The diamonds give the distribution 
of U values for that small subset of these 'map' pixels which 
were identified as peaks. Finally, the dotted curve (theory) and 
the circles (Monte Carlo) show the distribution of values ob- 
tained by fitting x and y coordinates as well as amplitude (= 3 
degrees of freedom) for each of the 'map' peaks. 

from a Monte Carlo. One can derive the number a posteriori 
via a plot such as figure [4] 

Figure H] was created as follows. An ensemble of (false) 
source detections was created via a Monte Carlo as described 
above in the description of figure [3] The detection software 
makes an estimate of the probability P that a source of that 
amplitude or greater will occur by chance from background. 
If that estimate is proportional to the true probability, then the 
proportionality may be expressed as follows: 

N(>P) ~ MfieldsAfield/peaks-P 

where N(> P) is the number of sources in the ensemble hav- 
ing P or greater, Mfi e id s is the number of fields, Ag e id is the 
area of the field in pixels, and / pea ks is the fraction of image 
pixels which are identified as peaks (ie, positive-valued local 
extrema). (Note that the diamonds, circles and dotted line on 
figure[3]should all trend to / pea ks as U approaches zero.) Taking 
logs and manipulating gives 

where the replacement L = - \n(P) has been made since this is 
the value written by emldetect to the column named DET_ML 
in the output source list. 
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Fig. 4. A plot which allows the fraction of peak pixels / pea ks to 
be estimated. This value is just the height to which the curve 
asymptotes at small values of L. As described in the text, a flat 
slope for such a curve indicates that the shape (but not necessar- 
ily the normalization) of the empirical probability distribution 
agrees with that predicted by theory. The black bars show the 
results of the bespoke Cash fitting software; the half-tone bars 
(red in the electronic version) show the results of the SAS de- 
tection procedure. Both plots are truncated some way short of 
zero because only candidate sources having a 'map' probability 
above a given cutoff were submitted to the fitting procedure. A 
number of computational reasons led to the choice of a much 
larger cutoff for the SAS fitting task (emldetecf). 

For each point on figure [4] N is calculated as the number 
of sources having that value of L or higher. The error bars are 
just y/N. Thus N increments as one proceeds from the highest 
L value to the lowest. 

Results for two source-detection procedures are plotted on 
figure |4] The black bars show results from software written 
for the present paper which performed all three stages of the 
source detection (section 12.2b . The half-tone bars (red in the 
electronic version) result from use of the XMM-Newton SAS 
tasks eboxdetect followed by emldetect. Actual XMM-Newton 
product file headers were used in the simulated image, expo- 
sure map and background map files in order to facilitate accep- 
tance of these by the SAS source detection tasks. 

The asymptotic flatness of the figure[4]plots again supports 
the close correspondence between the actual distribution of val- 
ues of the Cash statistic and the^ 2 distribution predicted by the 
Cash theory. 

Finally, although it is beyond the scope of the present pa- 
per to perform detailed tests on emldetect, one can at least say 
from figure|4]that the value of DETJVIL is being calculated cor- 
rectly. This seems not to be consistent with the results shown 
in figure 4 of Brunner et al 2008 which were also calculated 
from simulated data. However, their simulation was much more 
sophisticated: every field contained very many faint confusing 
sources, and a statistical procedure had to be used to identify 



'false positives' . This may explain the discrepancy with present 
results. 

3. Sensitivity comparison 

3.1. The model 

The flux model used in the simulation was based on equation 
[2] The physical coordinates here of interest are the two spatial 
coordinates x and y, representing position on the detector (or, 
equivalently, position on a projection plane tangent to the ce- 
lestial sphere), plus the energy E of the x-ray photons. Samples 
along both x and y axes are taken to be equally spaced, located 
indeed on an orthonormal array, centred on the source centre 
(xo,yo); but samples may be unevenly spaced in energy. It is 
convenient to modify equation [2] to make use of three indices, 
one per coordinate, viz: 

Ci,j,k = Bi,j,k + ij,k- 

Both i and j are restricted to the range —M to M for some small 
integer M. The symbol N is henceforth used to denote the num- 
ber of energy bands. 

The signal samples are related to the PSF S (x, y, E) as fol- 
lows: 

Jr^Xi+Ax/2 r*yj+Ay/2 
I dxdyS(x,y,E k ) 
Xi-Ax/2 Jyj-Ay/2 

where Ax and Ay are the respective sizes of the pixels in the x 
and y directions and = (E^io + £^hi)/2, ie the average of the 
bounding energies of band k. S is normalized such that 



1 



AxAy 



Xoo /-*oo N 
I dxdy^ S (x,y, E k ) = 1. 
CO xj —CO k~\ 



(4) 



(Note that this is not the same normalization condition as used 
for paper I.) Hence, a has units of x-ray counts and may be 
interpreted directly as the expected total number of counts, 
summed over all N bands and over the entire detector plane, 
which will be generated by the source S during a given expo- 
sure duration. 

As for paper I, both Ax and Ay are set to the same value 
of 4 arcsec, which is the same as the pixel size of the x-ray 
images produced as standard XMM-Newton products, and as 
used for source detection in both the 1XMM and 2XMM cata- 
logs (Watson et al l2003l and 120081 . 

The PSF used, for both single-band and multi-band cal- 
culations, was that corresponding to the pn camera of XMM- 
Newton on the optical axis, and at a photon energy of 1 .25 keV, 
which is the centre of band 2 of the 1XMM catalog x-ray data 
products (Watson et al 2003). In paper I the PSF was allowed 
to vary in shape with energy band, as it is observed to do in 
XMM-Newton. This is discarded in the present paper in order 
to avoid confusion between the small effect due to the change in 
shape of S with energy and the large effect due to its changing 
magnitude. 

The expected background flux fiy^ in counts per pixel is 
taken throughout to be spatially invariant, but in general dif- 
ferent within each energy band. Source and background spec- 
tra used, plus the associated energy band definitions, are taken 
from table 1 of paper I. 
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3.2. The sensitivity calculation 

As noted in section 1231 the first step in source detection, thus 
also in calculating the sensitivity of that detection, is to decide 
the acceptable probability of a false detection. Here, as in paper 
I, a value of exp(-8.0) is used for this probability cutoff Pdet- 
This was the value used in the making of the 1XMM catalog 
(Watson et al 120031 

The next step is to invert the appropriate expression for P(> 
U; 0) to obtain i/det- In a minority of cases, an accurate closed- 
form expression for P is known: in these cases, P was inverted 
by use of Ridders' method (Ridders fl 9791 1 as described in Press 
et al (2003 ). Ridders' method was chosen rather than a Newton- 
type method because of a paucity of expressions for dP/dU. 

In the majority of cases, no closed-form expression or ap- 
proximation for P is known which is accurate enough for 
present purposes. For each detection method without an accu- 
rate formula for P, a Monte Carlo technique was used to esti- 
mate J/det- An ensemble of rj sets of cy^ were generated from a 
model comprising purely the background B^. U was calculated 
for each set of counts values. The final ensemble of U m values, 
for m e [l,rj], was sorted into increasing order. £/det is then 
approximately that value of U m for which (r\ - m)jr\ ~ Pd et . 
Clearly rj must be chosen such that m » 1 . 

Having obtained t/det, the final step is to invert an expres- 
sion for {U) to derive a^et- An accurate and invertible expres- 
sion for {U) could be found for the majority of the detection 
methods. Where not, a Monte Carlo technique had to be used 
for this step of the calculation as well. In this onerous and 
somewhat problematic procedure, a Ridders iteration was em- 
ployed to converge on a value of a^et- For each trial value in the 
iteration sequence, an ensemble of sets of c,- ^ was obtained. 
Each set yielded its value of U. The mean of the ensemble of 
U values was tested against Udet and the trial a adjusted ac- 
cordingly until convergence was judged to have been reached. 

3.3. 1 -band results 

Three single-band detection methods are compared here, viz: 

1 . Sliding-box: U is obtained simply by summing the c,- values 
within the box. 

2. Optimized linear filter: U is obtained via a weighted sum of 
the c,-, the weights being chosen to optimize the sensitivity. 

3. Cash-statistic detection: U given by the Cash statistic 
where amplitude only is fitted. 

Formulas for P(>U ; 0) and (U) are given in appendix lB.il 

Sensitivities of the three methods are compared for a range 
of values of background flux B in figure [5] All values were 
calculated for a square array of pixels with M — 4. 

The reason the sliding-box curve shows stepwise discon- 
tinuities is because (as described in section 12.31 1 U for any 
method of detecting signals in Poissonian data must be a dis- 
crete function, not defined for all values on the real line, due 
to the discrete nature of the counts value in any pixel. All the 
P functions, and any expression derived from them, thus de- 
scend in step fashion. This is most obvious for the sliding-box 
method because U for this method is highly degenerate - ie 
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Fig. 5. Minimum detectable source counts a^t, plotted as a 
function of background flux B; single-band case. The solid 
line gives the results of the sliding-box technique; the dia- 
monds show the results of detection by matched linear filter; 
the crosses show the results of the Cash method. 



there are many arrangements of counts c, which will produce 
the same U. The steps are still present for the other methods 
but are usually too finely spaced to be noticeable. 

As one might expect, both the matched-filter and Cash 
methods are seen to be more sensitive (by a factor of at least 
20% at any value of background) than the sliding-box method. 
This is because they both make use of more information, 
namely the shape of S , to better discriminate between S and 
B. What is not so expected is that the Cash method appears 
to be marginally better than the matched-filter method. At first 
it seemed possible that this was an artifact, a result of insuffi- 
cient convergence of the Powell's-method routine which calcu- 
lates the weights for the matched-filter method; however, tests 
using much stricter convergence criteria failed to reveal any 
significant dependence. The matched filter detection is by def- 
inition the optimum linear filter; however the Cash detection 
algorithm is based however on a nonlinear expression, which 
is optimized not just for the general conditions but for the par- 
ticular arrangement of counts within each (2M + 1) by (2M + 1) 
detection field. One must assume that this accounts for its better 
performance. 

Figure |5] is comparable to figure 6 of paper I, with the ad- 
dition of the points for the Cash results. Some differences can 
however be observed between the sensitivity values displayed 
in figure|5]and those given in the earlier figure. There are three 
reasons for this. Firstly, the normalization expression for S is 
different: in paper I, the normalization integral for S extended 
only as far as the boundary of the square (2M + 1) by (2M + 1) 
patch of pixels in which counts were considered; in the present 
paper the integral extends over the whole plane. 

Secondly, a smooth approximation to P(>Ubox', 0) was used 
in paper I, rather than the true step-function form used here. 
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Fig. 6. The sensitivity of different detection methods as the size 
of the 'detection field' is varied. The field side length is 2M + 1 
pixels. Results are presented for three values of expected back- 
ground flux B, given in counts per pixel. Those results belong- 
ing to the same method and the same value of B have been 
linked with dotted lines to improve the legibility. The squares 
represent the sliding box convolver; diamonds, the matched fil- 
ter method; and crosses, the Cash method. 



Thirdly, whereas in the present paper the same value of 
M — 4 has been used for all methods, in paper I the values for 
the sliding-box method were calculated using M = 2. Figure 4 
in paper I was provided to allow the reader to estimate the ef- 
fect of this choice. This figure is repeated here as figure|6l with 
the Cash results included. 

The points to note from figure [6] are twofold. The first is 
that, for both the matched-filter and Cash methods, the mini- 
mum detectable source count always decreases as the size of 
the field is increased, whereas the sliding box method reaches 
a minimum, with a slight dependence on the background level. 
The reasons for this were explored in paper I but, to reprise 
briefly: when confronted with a large field, most of which must 
be almost pure background, the sliding box method adds in 
all the background-dominated counts with equal weight, which 
therefore increases the total noise; whereas the matched-filter 
method assigns smaller weights to pixels as the fraction of S to 



B decreases. The same is true of the Cash method, for which 
the fit is dominated by pixels having a high S /B ratio. 

The second point to observe in figure [6] is that all meth- 
ods become of roughly similar worth as M becomes small. For 
M — 0, corresponding to a detection field consisting of a single 
pixel, one would expect them to become identical, since any 
knowledge of the shape of S is useless in this case. The reason 
for the persistence of some differences at M = 0, particularly 
at low background, is unknown. However, the differences occur 
within a regime for which the expected total number of counts 
is less than 1. One might expect most measures to be a little 
'noisy' under such circumstances. 

3.4. N-band results 

Five multi-band detection methods are compared here, viz: 



1. 



3. 



Sliding-box: U is obtained simply by summing the c,- values 
within the box, over all bands. 

eboxdetect: this is also a sliding box method, but differs 
from the previous in that calculation of U is two-stage. The 
first stage calculates Uk for each kth band by summing val- 
ues as usual. U is then obtained from the Uk via a formula 
(see appendix lB.2.2b for which the theoretical underpinning 
is unclear. 

Optimized linear filter: U is obtained via a weighted sum of 
the Cj over all pixels and bands, the weights being chosen 
to optimize the sensitivity. The complete form of S , thus of 
the source spectrum, must be assumed. 
Cash detection, fixed spectrum: U is given by the Cash 
statistic where amplitude is fitted for all bands in parallel. 
A source spectrum must be assumed. There is thus only 1 
degree of freedom in the fit. 

Cash detection, free spectrum: same as the preceding, ex- 
cept that the amplitude is fitted for each band separately, 
giving N degrees of freedom, for N bands. 



Formulas for P(>U ; 0) and (U) are given in appendix[ 

Sensitivities of the five A^-band methods are compared for 
a range of values of background flux B in figure [7] All values 
were calculated for a square array of pixels with M = 4. 

The background flux B in all figures relating to multi-band 
detection was calculated by summing the background Bk in 
each band. This provision, together with the normalization of S 
described in equation@] allows one to compare directly single- 
to multi-band detection. This was not the case in paper I. 

The same changes and improvements as described in sec- 
tion [33]hinder direct comparison of figure |7]of the present pa- 
per with figure 9 of paper I. However if one increases all values 
in figure 9 paper I by about 40% to compensate for the dif- 
ferent normalization of S in that paper, the respective curves 
(diamonds in both papers indicate the matched-filter results, 
whereas the Monte Carlo-corrected eboxdetect results are indi- 
cated by crosses in paper I and squares in the present paper) are 
seen to match reasonably well. 

The normalization condition for S chosen in the present pa- 
per does however facilitate comparison between the 1 -band and 
A^-band situations. The summed, sliding-box detection method 
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Background flux B (counts/pixel) 

Fig. 7. Minimum detectable source counts a^ et , plotted as a 
function of background flux B; A^-band case. The solid line 
gives the results of the technique in which all bands are 
summed before sliding-box is applied (method 1 this section). 
Squares represent the results of the eboxdetect sliding-box 
technique (method 2). Diamonds show the matched-filter re- 
sults (method 3). The Cash method in which a source spec- 
trum is assumed (method 4) is represented by crosses; the alter- 
nate, assumption-free Cash method (method 5) is represented 
by filled triangles. 



is seen, as expected, to provide identical results here as in the 
single-band case. Some of the other A^-band methods are con- 
ceptually similar to 1-band counterparts, and have been given 
the same graph symbols to highlight this fact; however, only for 
the sliding-box method are the A^-band and 1-band sensitivity 
values expected to be identical. 

The first thing to note is that the eboxdetect version of 
sliding-box, in which the likelihoods for each band are summed 
to give U, performs slightly better than the simplistic summed- 
band version. Why? The answer is that the simple sum actually 
implies an assumption about the source spectrum, namely that 
it is fiat - or, more precisely, that the source spectrum is such 
that the optimum weights per band are identical. This method 
might therefore be expected to perform better than the eboxde- 
tect method on sources which obey this criterion, and worse (as 
in the present case) when they do not. However there seemed 
little point to the present author in testing this speculation be- 
cause the summed-likelihood sliding box method is in any case 
far from optimum. 

Again the equivalent matched-filter and Cash methods, 
namely methods 3 and 4 above, represented respectively by di- 
amonds and crosses, perform about the same. The improved 
perfomance of the Cash method at low values of background is 
no doubt due to the same reasons as in the single-band case. 

As one might expect, the version of the Cash statistic which 
does not require an a priori decision about the source spec- 
trum performs less well than methods which do, at least in 
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Fig. 8. Same as figureQexcept a hard source spectrum has been 
used to generate the data. 



the present case in which the true source spectrum matches the 
template. The contrary case is examined in the next subsection. 

3.5. Effect of non-matching spectra on sensitivity 

Figure [8] is similar to figure [7] except that there is now a mis- 
match between the spectrum used to generate the counts (and 
to calculate the sensitivity) and the spectrum assumed in the de- 
tection process. Such a mismatch is of course only meaningful 
for those methods which require a spectrum to be assumed. The 
hard source profile in table 1 of paper I was used to generate 
the Monte Carlo data, whereas the more representative source 
spectrum, the same used for both purposes in figure[7l has been 
used in the appropriate detection schemes. 

The summed-band sliding-box method again provides the 
unvarying benchmark. Among the remaining methods, two 
trends are evident: those methods (respectively eboxdetect and 
free-spectrum Cash) which do not assume a source spectrum 
perform about the same as previously; whereas those which do 
(fixed-spectrum Cash and matched-filter) perform now signifi- 
cantly worse. This is as one would expect. 

It is interesting that the fixed-spectrum Cash and matched- 
filter methods now diverge quite markedly at low values of 
background. The reason for this behaviour is not known. 

4. Some (U) and P difficulties 

4.1. An approximate expression for(Uc afi h) 

As described in appendices IB. 1.3IIB. 2. 41 and IB. 2. 51 no closed- 
form expression for (C/cash) is known. In its absence, inversion 
to derive a&t from t/det must proceed via a tedious and prob- 
lematic Monte Carlo. Such an approach is clearly going to be 
inadequate for 'in the field' calculations of sensitivity. The fol- 
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Fig. 9. Comparison between the approximate expression for 
(£/cash) developed in section |4~T1 and Monte Carlo estimations 
of it. The Monte Carlo points are indicated by circles centred 
on error bars (many of the error bars are too small to be visi- 
ble on this plot). All values were calculated for the single band 
case with M — 4. Half-tone diamonds, connected by dotted 
lines (both coloured red in the electronic version), indicate the 
sensitivities appropriate to each background value. 



lowing expression appears however to offer an acceptable ap- 
proximation for such cases: 



(t/Cash) ~ 1 + 



+ aSj] In 



Bi + aSi 



Bi 



J-2a^S ; . (5) 



This expression is compared to Monte Carlo data for several 
values of background in figure [9] 

Equation [5] was obtained by a series of guesses at reason- 
able approximations. Initially, a formula was sought for the 
case in which negative-amplitude fits are not discarded. The 
additional 'fudge factor' of unity produced a function which 
gave an excellent fit to the Monte Carlo data for such a case. 
Discarding negative fits, as is done for all results in the present 
paper, degrades this close match at low amplitudes. However, 
the approximation is seen to be a good match at a values cor- 
responding to realistic sensitivities (the half-tone diamonds in 
figure |5J>. For this reason it is considered to be useful for the 
construction of sensitivitity maps for Cash source detection. 

4.2. A Cash statistic for sliding-box 

There are two problems in the calculation of (U) and P for 
the eboxdetect multi-band algorithm: firstly, there appears to be 
no closed-form expression for (U); secondly, the best available 
approximation for P is significantly inaccurate at low values of 
background. 



4.2.1. Incorrect (U) 

In paper I, sensitivities for the eboxdetect detection method 
were calculated using the assumption that (U) could be approx- 
imated by 



(f/)~-2^1n(P[>([/ t ); a =0]). 

k=l 

P(>(Uic);a = Q) here is given by 



(6) 



P{>(U k );a=G) = 1-2 



where Q is the complementary incomplete gamma function^ 
defined by 



Q(a, x) 



i r 



dt exp(-t)f 1 . 



This expression was numerically inverted by use of Ridders' 
method to derive the sensitivity adet from Udet- Monte Carlo 
experiments show, however, that equation [6] overestimates ordet 
by about 10% over most of the background range. It is simply 
incorrect and should not be used. 

4.2.2. Inaccurate P 

Figure 3 of paper I shows that use of the eboxdetect formula 
for P (ie, the formula used to calculate the values written by 
eboxdetect to the LIKE column of its output source list) can be 
incorrect by as much as an order of magnitude at moderately 
low values of background. Without an understanding of the the- 
oretical underpinnings of the eboxdetect method, it is difficult 
to suggest a rigorously better alternative. All sensitivity values 
calculated for this method in the present paper therefore made 
use of a Monte Carlo approximation. 

4.2.3. A more accurate alternative 

One way around these difficulties is to recast the eboxdetect 
algorithm in terms of the Cash statistic. This probably doesn't 
make the method any more sensitive, but it does make its sen- 
sititivity (and its detection likelihoods) easier to calculate. 

Since we are not using the information in the spatial varia- 
tion of S , both background and counts can be added across the 
detection field to give, respectively, a single number for each 
band. We define t/BoxCash as 



f^BoxCash = ^ Uk 



1 The somewhat awkward notation 1 - Q is here chosen, instead of 
the more compact and natural symbol P = 1 - Q, representing the 
other incomplete gamma function, merely in order to avoid using P 
for more than one quantity. Other alternative representations for the 
incomplete gamma functions alas offer no improvement since they in- 
volve additional mentions of the gamma function T(a). 
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where Uk has the usual Cash form, even though there is here 
only one 'pixel' to sum over: 

j , , / #total,* + &kS total,* \ „ 

Uk = 2 CtotaU m v - 2akb total,* 

\ D total,* / 

for 

M M 

fitotal,* = ^ 

i=-Af ;=-Af 

etc. In this instance it is easy to deduce that, for the Mi band, 

Ctotal,* — ^total.i 

a k = — = ; 

>J total,* 

The expression for Uk thus becomes 

U k = fitotai,* - Ctotai,* In ( ( ^ louU I where C to tai,* > 

\» total,*/ 

= Stotai,* otherwise. 

P is as given in the appendix IB. 2. 51 and we can now use the 
approximation in equation [5] to estimate (and therefore invert) 
(U). Monte Carlos show that the Cash^ 2 prediction remains a 
good fit to P, with the usual x0.5 normalization. 

5. Conclusions 

This paper has attempted to set out clearly the issues involved 
in the detection of sources in images which exhibit Poisson 
noise. A three-stage detection algorithm was proposed in which 
the first stage makes a map of the detection likelihood at each 
image pixel, the second searches for peaks in this map and the 
third fits a source profile to each peak so as to return best-fit 
values and uncertainties for the amplitude and position of each 
source. Some conditions which the data must obey in order for 
this approach to succeed were examined. The proper way to 
define detection sensitivity was also rather carefully gone into. 

It was realized after the issue of paper I that the probability 
distribution of map values is not in general the same as the dis- 
tribution of values at a set of pixels restricted to peaks in that 
map. This has implications for the calculation of the sensitivity 
of a detection method, since the starting point in this calcula- 
tion, namely the acceptable maximum in the number density of 
false detections returned by the method, translates to a cutoff 
value in the peak rather than the map distribution. On the other 
hand, because the map and peak distributions can be shown to 
converge at high values, it was felt allowable to use the map 
distribution as a proxy for the peak one when comparing sensi- 
tivities of different detection methods. Thus the results of paper 
I are not invalidated and could be extended here. 

The Cash likelihood-ratio statistic (LRS) as a means of 
detecting sources amid Poissonian noise was here described. 
This statistic has been used for source detection at least in 
Rosat images (Cruddace et al l 19771 and Boese and Doebereiner 
[200Tb and also for XMM-Newton (Watson et al [2003l [2008l l. 
According to Cash ( 119791 ), or rather to Wilks d 19631 ) as cited by 
Cash, such an LRS ought to be distributed as y 2 with q degrees 
of freedom, where q is the number of free parameters. It has 



however been claimed that the LRS is not well suited to source 
detection because the probability distribution of values of such 
a statistic calculated under the null hypothesis (ie that the image 
contains no sources) is not well known. Protassov et al (j2002) 
for example maintain that the source-detection application of 
LRS disobeys some of the necessary regularity conditions un- 
der which the distribution of an LRS can be expected to follow 
that of x 1 - In the present paper this claim is disputed; Monte 
Carlo experiments are also described which show, under typi- 
cal x-ray instrument conditions, and provided that sources with 
negative fitted amplitude are discarded from the ensemble, that 
the distribution of the Cash statistic, evaluated for images sim- 
ulated from a source-free flux model, adheres in shape at least 
quite closely to a% 2 distribution with the appropriate degrees of 
freedom. All that is necessary is to evaluate the normalization 
constant between the two. 

Although it was found convenient for purposes of the 
present paper to write bespoke code to perform the Cash- 
statistic source detection and fitting, the XMM-Newton SAS 
task emldetect was briefly compared to this bespoke code, and 
its results shown to also obey the^ 2 distribution. 

The main thrust of the present paper has been to com- 
pare the sensitivity of different detection methods. The Cash- 
statistic detection is shown to be the best of the methods tested. 
In terms purely of sensitivity it is seen to be only marginally 
superior to an optimized linear filter; however the Cash method 
is shown to possess two further advantages. Firstly, it is more 
flexible, since it allows the user the option of assuming a source 
spectrum or leaving this free. Secondly, for the final detection 
probability, one need not just accept the 'map' value at the pixel 
nearest to the final source position: instead one can calculate a 
final, accurate value specific to that fitted position. 

The flexibility of the Cash statistic was also demonstrated 
by showing how it can be adapted to calculate detection likeli- 
hoods for sliding-box source detection as performed for exam- 
ple by the SAS task eboxdetect. The advantage in doing so is 
that one can make use of the demonstrated close agreement be- 
tween the distribution of the Cash statistic and;^ 2 . This would 
represent a considerable improvement over the likelihood for- 
mula employed within eboxdetect. 

As a final result, the paper describes approximations which 
can be used to calculate sensitivity maps appropriate to Cash- 
style source detection. For XMM-Newton the only previous 
ways to do this have been via the SAS task esensmap (which 
examination of the code shows contains an algorithm which 
could only have serendipitous success), or via an empirical cor- 
rection to sliding-box sensitivities (eg Carrera et al 2007 ). 

Acknowledgements. Thanks are due firstly to Georg Lamer for pro- 
viding the vital clue to understanding the Cash algorithm; and sec- 
ondly to Anja Schroder for valuable assistance in obtaining XMM- 
Newton data. 
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Appendix A: Cash amplitude fitting 



To calculate a map of the Cash statistic (section l2.4.U in a sin- 
gle spectral band, we centre a PSF S on each pixel of the image 
and fit only its amplitude a. The fit is performed over a small 
number N pixels surrounding the pixel for which we are cal- 
culating the value. L au \\ is calculated by setting a to zero. The 
resulting probability is then the probability of obtaining the fit- 
ted value a when there is in fact nothing there but background 
B. The expression, derived from equation[3] for the Cash statis- 
tic relevant to this case is 



J3 




Fig.A.l. An example plot of equation [Al with 3 singularities. 
The dashed portion represents values beyond the 'statistical 
limit'. 

where a is the fitted value of a. 

As mentioned in section 12.4.31 statistically speaking it is 
permitted for a to be negative; whether we later discard such 
occurrences depends on the physical model which is appropri- 
ate, and also on what we wish to do with the data. What is not 
permitted is for B,+aS , to be negative for any i. (If the recorded 
counts at the ith pixel is > 0, B t + aS , is not permitted to equal 
zero either - but this is a formal condition which has no effect 
on the computational practicalities.) The absolute limit on per- 
mitted a values in the negative direction is thus - min(B/5'). 

If c, = for all N pixels in the fitted patch, a should be 
set to this limit. This, the model with the least amount of flux 
which the parameter limits permit, is the one most likely to ex- 
plain the lack of counts. If some c, > though, of course a may 
be larger than - min{B/S). This possibility can be explored by 
taking the derivative of U with respect to a and setting it to 
zero. This gives 



= 2 §My- 2 2>= a <A - 2> 



da 



Equation IA.2I can be solved numerically for a, using for ex- 
ample the fast and robust Newton-Rhapson method (cf Press et 
al 120031) . To help in visualizing how to apply this it is useful 
to look at a diagram. Figure |A~T1 shows an example plot of the 
function 



y(a) 



I 



Ci 



Bi/Si 



f/Ca.sh = 2 2j °i ln 



Bi+aSi 
B~ 



2&y Si 



(A.l) 



y has M hyperbolic singularities, one for each of the subset 
of M values of c,- which are > 0. Although formally speaking 
there are M solutions to equation IA.2I (indicated in figure I A. 1 1 
by intersections of the graph with the horizontal line at y — 
2 S ,), the largest solution is clearly the one desired. 

Newton-Rhapson solution can be facilitated in two ways: 
by estimating bounds for a and by performing a coordinate 
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transform. Firstly, let us label as the j'th that pixel for which the 
following conditions hold: firstly that c > for that pixel and 
secondly that, of that subset of M pixels for which this holds, 
B/S is the smallest. Clearly the most positive singularity occurs 
at a — -(B/S)j. a is bounded as follows: 

The rate of convergence of the Newton-Rhapson algorithm 
can be accelerated by performing a coordinate transform 

u = l/(a + Bj/Sj). 

The 'statistical' bounding condition on a, namely a > 
-min(B/S), must be applied if the Newton-Rhapson solution 
exceeds it, as is possible. 

Appendix B: Formulae for sensitivity calculations 

B.1. 1 -Band detection 

Expressions for U, P(>U; 0) and (U) are given below for the 
three single-band detection methods compared in section I3.3I 
The k index is superfluous in the present single-band scenario 
and has therefore been suppressed for the time being. 



B.1.3. 1-band Cash-statistic detection 



B.1.1. 1-band sliding-box 

M M 



(B.l) 



i=-M j=-M 



P(>U; 0) = 1 - QiU, [2M + l] 2 x B Uj ), 

where Q is the complementary incomplete gamma function as 
previously. 

MM MM 

i=-M j=~M i=-M j=-M 

B.1.2. 1-band optimized linear filter 

M M 

i=-M j=-M 

where the Wjj are optimum weights, calculated by a procedure 
described in paper I. 

/ B' U 
P(>U;0)~Q' 



Wequiv VV e q U i v 



where 



i=-M j=-M 



and 



_ Li=-M L j=-M W i,j JS 'J 

EqU1V ~ VM Z R ' 

Zui=-M Li j=-M w iJ°iJ 



<'• >- X X ^jiBij + aS.j). 



Expansion of equation IA.1I to include the extra spatial index 
gives 



mm j T5 . « c \ MM 



i=-M j=-M 

P(>U;0)~Q(l/2,U/2). 



i=-M j=-M 



There appears to be no easy route to an exact, closed-form ex- 
pression giving (C/cash) in terms of B, a and S (although see 
section |4~T| for an approximate formula). One can estimate the 
a appropriate to a given value of (U) via iterative Monte Carlos 
as described in section I3.2I in fact this must be done if one 
wants to investigate the effect on the Cash detection algorithm 
of fitting with the wrong shape S . All the sensitivity values for 
the Cash algorithm presented in the present paper have there- 
fore been estimated by this method. 

B.2. N-band detection 

Expressions for U, P(>U; 0) and (U) are given below for the 
five single-band detection methods compared in section l3~4l 

B.2.1 . Sliding-box detection on a single, all-band 
image 

This is exactly the same as the algorithm described in section 
IB. 1 . H The sum over bands is here just made explicit. 

This method does not require the user to choose a source 
spectrum template. 

M M N 

u =1l XX c <^' 

i=-M j=-M k=\ 

N 

P(>U; 0) = 1 - Q(U, [2M + l] 2 x J] B iJM ), 
and 

<^ = X XX B ^ + «X XX s * 



M M N 



k=\ 



M M N 



i=-M j=-M k=\ 



i=-M j=-M jfc=l 



B.2.2. Sliding-box detection, eboxdetect statistic 

This is the detection method employed by the XMM- 
Newton SAS task eboxdetect (online description at 
http://xmm.vilspa.esa.es/sas/current/doc/eboxdetect.ps.gz). 
The method does not require the user to choose a source 
spectrum template. 

U is calculated in a three-step process. In the first step, 
counts within the box are summed for each kth band to give 
Uk- The second step forms P(>Uk', 0) for each k. The final U is 
then given by 



i=-M j=-M 



N 

t/ = X L * 



(B.3) 



k=\ 



I. M. Stewart: Maximum-likelihood detection of sources among Poissonian noise 



17 



where 

L k = -2HP[>U k ;a=0J). 

No description of eboxdetect appears to have been published; 
hence the theoretical basis of this expression is obscure. 

If one assumes (which is not strictly true) that the above 
follow distributions which don't change with k, one can apply 
the Central Limit Theorem to equation |B. 3 1 to give, in the limit 
of high N, the following forrriforP: 

P(>U; 0) = 0.5(1 + Q[0.5, U 2 l2Ncr 2 L ]) 

where cr 2 L is the variance of Lj for any k. However in eboxdetect 
practice, still evident in the code in eboxdetect version 4. 19, the 
actual expression used is 

P(>U;0)~ Q(N,U/2); (B.4) 

Some experimentation in both paper I and the present paper 
confirms that equation lB.41 does appear to match Monte Carlo 
results in the limit of high background. The issue is in any case 
academic for present purposes, since as in all cases where only 
an approximate expression for P is available, the results pre- 
sented here are derived by use of a Monte Carlo estimation of 
P. 

No closed-form expression for (U) exists: it must be calcu- 
lated numerically. 

B.2.3. Af-band optimized linear filter 

Essentially this is the same algorithm as described in section 
IB. 1.21 just with the sum over the N bands made explicit. This 
method requires the user to choose a source spectrum template, 
in order to calculate the optimum weights w. 

M M N 

^ X X J] w iJ* c uX, 

i=-M j=-M k=l 

P is the same, but with of course 

M M N 

«' -XX X uv -^ u 

i=-M j=-M k=l 

and 

V* V M V iv d 
2j,=-M Zj j=-M 2jjt=l w ij,k a '<j£ 

WeqU ' V = vM yN Z, R ' 

M M N 

{U) = X X Yu Wi ^ Bi -i- k + aS U*>- 

i=-M j=-M k=\ 

B.2.4. Cash-statistic detection with a spectrum 
template 

Essentially this is the same algorithm as described in section 
IB. 1.31 just with the sum over the bands made explicit. This 



algorithm requires the user to choose a source spectrum tem- 
plate. 

M M N i D ,~c \ MM 

i=-M j=-M k=\ V ''■> ' i=-M j=-M 

P is the same as in section |B. 1.31 and (U) is again estimated 
via a Monte Carlo procedure (though see section |4~TT >. 

B.2.5. Cash-statistic detection without a spectrum 
template 

The difference between this method and the preceding one is 
that here a value of a is calculated, ie the source PSF is fitted to 
the data, separately for each band. No knowledge of the source 
spectrum is required. U is then formed by summing the Uu for 
each band. Since there are now Af degrees of freedom in the fit, 
P is here given by 

P(>C/;0)~ Q(N/2,U/2). 

(U) is again estimated via a Monte Carlo procedure (though 
see section |4~TT i. 



2 This derivation makes use of the identity erfc(x) = 2(0-5, x 2 ). 



